CoBWeb - A Crawler for the Brazilian Web

نویسندگان

  • Altigran Soares da Silva
  • Eveline A. Veloso
  • Paulo Braz Golgher
  • Berthier A. Ribeiro-Neto
  • Alberto H. F. Laender
  • Nivio Ziviani
چکیده

One of the key components of current Web search engines is the document collector. This paper describes CoBWeb, an automatic document collector, whose architecture is distributed and highly scalable. CoBWeb aims at collecting large amounts of documents per time period, while observing operational and ethical limits in the crawling process. CoBWeb is part of the SIAM (Information Systems in Mobile Computing Environments) search engine which is being implemented to support the Brazilian Web. Thus, several results related to the Brazilian Web are presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Analyzing new features of infected web content in detection of malicious web pages

Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...

متن کامل

CobWeb: Tailorable, Analyzable Rules for Collaborative Web Use

CobWeb is a collaborative web browsing system that allows the rules governing the interactions of multiple users the collaboration protocol to be externally speci ed and dynamically changed We explain the architec ture of the CobWeb implementation and conclude by showing how the Java classes de ning collaboration pro tocols are generated from visual formal speci cations We note also that though...

متن کامل

CobWeb: Visual Design of Collaboration Protocols for Dynamic Group Web Browsing

CobWeb is a collaborative web browsing system that allows the rules governing the interactions of multiple users (the collaboration protocol) to be externally specified and dynamically changed. We explain the architecture of the CobWeb implementation, and conclude by showing how the Java classes defining collaboration protocols are generated from visual formal specifications. We note also that,...

متن کامل

Envia Garciai, a New Genus and Species of Mygalomorph Spiders (araneae, Microstigmatidae) from Brazilian Amazonia

The genus Envia, comprising only the new species Envia garciai, is proposed. These small mygalomorph spiders were abundantly collected in soil cores and litter samples in primary rain forests near Manaus, Amazonas, Brazil.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999